Skip to content

feat(codeceptq): CLI to query HTML with CodeceptJS locators#5550

Merged
DavertMik merged 16 commits into4.xfrom
feat/cq-parser
May 5, 2026
Merged

feat(codeceptq): CLI to query HTML with CodeceptJS locators#5550
DavertMik merged 16 commits into4.xfrom
feat/cq-parser

Conversation

@DavertMik
Copy link
Copy Markdown
Contributor

Summary

Adds codeceptq — a standalone CLI that takes an HTML stream (stdin or --file) plus a CodeceptJS locator (CSS / XPath / fuzzy / semantic) and prints matched elements with line numbers and outerHTML snippets.

Designed for AI agents iterating on locators against aiTrace's per-step *_page.html snapshots: "would this locator match at step N?" — answered in milliseconds, no browser, no re-run.

# CSS / XPath
cat output/trace_*/0007_*_page.html | npx codeceptq './/input[@required]'
npx codeceptq '#submit' --file output/trace_*/0007_*_page.html

# semantic locators (label / button text / option / checkbox)
npx codeceptq 'Email' --field --file output/trace_*/0003_*_page.html
npx codeceptq 'Save' '.modal' --click --file output/trace_*/0005_*_page.html

# JSON for scripting
npx codeceptq 'Username' --field --json --file output/trace_*/0002_*_page.html

Changes

  • New bin/codeceptq.js (registered as codeceptq in package.json#bin).
  • New lib/command/query.js — parse5-tracked line numbers + xmldom xpath eval. Reuses Locator for CSS→XPath and semantic builders (Locator.field.byText, Locator.clickable.wide, Locator.checkable.byText, Locator.select.byVisibleText).
  • lib/html.js#formatHtml now passes inline: [] to js-beautify so every element in trace HTML lands on its own line — line numbers from codeceptq map 1:1 to elements.
  • xpath@0.0.34 promoted from devDependencies → dependencies (already in tree).
  • Default --snippet length 500 chars; --full for complete outerHTML; --json for tooling.
  • Exit codes: 0 match, 1 no match, 2 invalid input/XPath.

Tests

test/runner/codeceptq_test.js — 45 tests against test/data/{checkout,github,gitlab,app/drag_drop}.html. Each assertion shows the expected { line, snippet } inline so the test source is also a behavior spec:

expect(parsed.matches).toEqual([
  { line: 87, snippet: '<input type="text" class="form-control" id="firstName" placeholder="" value="" required>' },
  { line: 94, snippet: '<input type="text" class="form-control" id="lastName" placeholder="" value="" required>' },
  // ...
])

Coverage: XPath, CSS (id/class/attr/forced), --field, --click/--clickable, --checkable, --select, fuzzy auto-detect, context scoping, stdin vs --file, --limit, --snippet, --full, --json, exit codes, large fixtures.

Test plan

  • npx mocha test/runner/codeceptq_test.js → 45 passing
  • npx mocha test/unit/html_test.js test/unit/utils/trace_test.js → existing tests still pass with new inline: []
  • npx eslint bin/codeceptq.js lib/command/query.js test/runner/codeceptq_test.js → clean
  • Smoke against real examples/output/trace_*/*_page.html — finds all 17 inputs with line numbers
  • Reviewer: try npx codeceptq 'something' --file output/trace_*/<step>_page.html after a real test run

🤖 Generated with Claude Code

DavertMik and others added 16 commits April 26, 2026 22:27
Adds `codeceptq` — a standalone CLI that takes an HTML stream (stdin or
--file) plus a CodeceptJS locator (CSS / XPath / fuzzy / semantic) and
prints matched elements with line numbers and outerHTML snippets.
Designed to give AI agents a fast feedback loop against `aiTrace`'s
per-step HTML snapshots: "would this locator match at step N?" without
re-running the test or spawning a browser.

- Reuses Locator class for CSS→XPath conversion + semantic builders
  (--field, --click, --checkable, --select).
- Optional context arg scopes matches: `codeceptq 'Save' '.modal' --click`.
- Stable output flags: --limit, --snippet (default 500), --full, --json.
- Exit codes: 0 match, 1 no match, 2 invalid input/XPath.
- formatHtml now uses `inline: []` so every element gets its own line in
  trace HTML — line numbers map 1:1 to elements for codeceptq output.
- 45 runner tests against test/data/checkout.html, github.html,
  gitlab.html, drag_drop.html assert exact line + snippet for every
  locator strategy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run_test, run_step_by_step, and pausedPayload now include aiTraceDir
(the per-test output/trace_<title>_<hash>/ folder) so agents can point
codeceptq directly at the saved *_page.html snapshots without globbing
or recomputing the hash. Per-test entries in reporterJson.tests[] also
carry the dir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'Sign up' --click case on github.html (2k-line fixture, 12-branch
semantic union XPath) takes ~8s locally and exceeds the default 10s
mocha timeout on slower CI runners. Suite-level timeout matches what
the local runs already use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Locator.clickable.wide and field.labelContains emit predicates of form
[@aria-labelledby = //*[@id][normalize-space(string(.)) = 'X']/@id ].
xpath@0.0.34 re-runs the inner //* scan once per outer element match —
O(N²) on non-trivial docs. The 2k-line github fixture spent 8.5s in
that single branch out of 12.

Pre-resolve the inner subquery once, splice the resulting id (or a
sentinel for no-match) back as a literal so the engine sees a flat
attribute compare.

Github 'Sign up' --click: 9026ms → 276ms (~33×).
Full runner suite: 14s → 6s.

Reverts the 30s describe-level timeout from the previous commit since
the underlying perf issue is now fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the post-hoc regex pre-resolver with strategy-level construction.
Each semantic locator (--click/--field/--checkable) is built as a list of
XPath branches; doc-wide subqueries (label[@for] resolution, ids by visible
text) are evaluated once and inlined as literal predicates instead of
sitting nested inside outer per-element predicates that the engine
re-executes on every match.

Eval loop runs each branch separately and sorts results by source offset
to preserve the document-order contract of XPath unions.

Github 'Sign up' --click: 9000ms → 264ms (independent of XPath engine —
fontoxpath benched the same as xpath@0.0.34 on the original union).
All 45 runner tests pass with identical line/snippet output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cate

The wide clickable / labelContains field XPath includes:
  .//*[@aria-labelledby = //*[@id][normalize-space(string(.)) = X]/@id]

That predicate forces every element to evaluate the inner //*[@id] subquery,
which is O(N²) on any non-trivial document for pure-JS XPath engines (xpath
npm: 7641ms on a 2k-line page; fontoxpath: 7057ms on the same branch).
Browser engines optimize via join-pushdown.

Adding [@aria-labelledby] as a left-to-right filter predicate first cuts
the slow comparison to only elements that actually have the attribute:

  .//*[@aria-labelledby][@aria-labelledby = //*[@id][...]/@id]

7641ms → 52ms (147×). Semantics identical: in XPath, [A][B] and [A and B]
produce the same result-set, but predicates are evaluated left-to-right,
so the cheap attr-existence check filters out the bulk first.

This is a single-character XPath change — codeceptq goes from 9000ms →
325ms on test/data/github.html with no special-case code. Reverted the
per-strategy reimplementation in lib/command/query.js (back to using
Locator.clickable.wide / Locator.field.byText directly).

Added two unit tests for the aria-labelledby branch in
Locator.clickable.wide (positive + negative).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@DavertMik DavertMik merged commit 4f0fa49 into 4.x May 5, 2026
10 checks passed
@DavertMik DavertMik deleted the feat/cq-parser branch May 5, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant